Molecular Systems Biology
○ Springer Science and Business Media LLC
All preprints, ranked by how well they match Molecular Systems Biology's content profile, based on 142 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Sorokina, O.; McLean, C.; Croning, M. D.; Heil, K. F.; Wysochka, E.; He, X.; Sterratt, D. C.; Grant, S.; Simpson, I.; Armstrong, J. D.
Show abstract
Synapses contain highly complex proteomes which control synaptic transmission, cognition and behaviour. Genes encoding synaptic proteins are associated with neuronal disorders many of which show clinical co-morbidity. Our hypothesis is that there is mechanistic overlap that is emergent from the network properties of the molecular complex. To test this requires a detailed and comprehensive molecular network model. We integrated 57 published synaptic proteomic datasets obtained between 2000 and 2019 that describe over 7000 proteins. The complexity of the postsynaptic proteome is reaching an asymptote with a core set of ~3000 proteins, with less data on the presynaptic terminal, where each new study reveals new components in its landscape. To complete the network, we added direct protein-protein interaction data and functional metadata including disease association. The resulting amalgamated molecular interaction network model is embedded into a SQLite database. The database is highly flexible allowing the widest range of queries to derive custom network models based on meta-data including species, disease association, synaptic compartment, brain region, and method of extraction. This network model enables us to perform in-depth analyses that dissect molecular pathways of multiple diseases revealing shared and unique protein components. We can clearly identify common and unique molecular profiles for co-morbid neurological disorders such as Schizophrenia and Bipolar Disorder and even disease comorbidities which span biological systems such as the intersection of Alzheimers Disease with Hypertension.
Zühlke, B. M.; Sokolowska, E. M.; Luzarowski, M.; Schlossarek, D.; Chodasiewicz, M.; Leniak, E.; Skirycz, A.; Nikoloski, Z.
Show abstract
Metabolite-protein interactions affect and shape diverse cellular processes. Yet, despite advances, approaches for identifying metabolite-protein interactions at a genome-wide scale are lacking. Here we present an approach termed SLIMP that predicts metabolite-protein interactions using supervised machine learning on features engineered from metabolic and proteomic profiles from a co-fractionation mass spectrometry-based technique. By applying SLIMP with gold standards, assembled from public databases, along with metabolic and proteomic data sets from multiple conditions and growth stages we predicted over 9,000 and 20,000 metabolite-protein interactions for Saccharomyces cerevisiae and Arabidopsis thaliana, respectively. Extensive comparative analyses corroborated the quality of the predictions from SLIMP with respect to widely-used performance measures (e.g. F1-score exceeding 0.8). SLIMP predicted novel targets of 2, 3 cyclic nucleotides and dipeptides, which we analysed comparatively between the two organisms. Finally, predicted interactions for the dipeptide Tyr-Asp in Arabidopsis and the dipeptide Ser-Leu in yeast were independently validated, opening the possibility for future applications of supervised machine learning approaches in this area of systems biology.
Innes, B. T.; Bader, G. D.
Show abstract
Cell-cell interactions are often predicted from single-cell transcriptomics data based on observing receptor and corresponding ligand transcripts in cells. These predictions could theoretically be improved by inspecting the transcriptome of the receptor cell for evidence of gene expression changes in response to the ligand. It is commonly expected that a given receptor, in response to ligand activation, will have a characteristic downstream gene expression signature. However, this assumption has not been well tested. We used ligand perturbation data from both the high-throughput Connectivity Map resource and published transcriptomic assays of cell lines and purified cell populations to determine whether ligand signals have unique and generalizable transcriptional signatures across biological conditions. Most of the receptors we analyzed did not have such characteristic gene expression signatures - instead these signatures were highly dependent on cell type. Cell context is thus important when considering transcriptomic evidence of ligand signaling, which makes it challenging to build generalizable ligand-receptor interaction signatures to improve cell-cell interaction predictions.
Beyer, A.; Weith, M.; Grossbach, J.; Clement-Ziza, M.; Gillet, L.; Rodriguez-Lopez, M.; Picotti, P.; Bähler, J.; Rudolf, A.; Marguerat, S.; Workman, C. T.
Show abstract
The complexity of many cellular and organismal traits results from poorly understood mechanisms integrating genetic and environmental factors via molecular networks. Here, we show when and how genetic perturbations lead to molecular changes that are confined to small parts of a network versus when they lead to large-scale adaptations of global network states. Integrating multi-omics profiling of genetically heterogeneous budding and fission yeast strains with an array of cellular traits identified a central state transition of the yeast molecular network that is related to PKA and TOR (PT) signaling. Genetic variants affecting this PT state globally shifted the molecular network along a single-dimensional axis, thereby modulating processes including energy- and amino acid metabolism, transcription, translation, cell cycle control and cellular stress response. We propose that genetic effects can propagate through large parts of molecular networks because of the functional requirement to centrally coordinate the activity of fundamental cellular processes. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=69 SRC="FIGDIR/small/519111v1_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@18d3323org.highwire.dtl.DTLVardef@1185111org.highwire.dtl.DTLVardef@17238c7org.highwire.dtl.DTLVardef@1edc6e6_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical abstract:C_FLOATNO Genetic variants directly or indirectly affecting the activity of PKA and/or TOR signaling cause global changes of transcriptomic and proteomic network states by modulating the activity of diverse cellular functions and network modules. Using marker genes acting downstream of PKA and TOR signaling we are able to quantify the activity status of combined PKA and TOR signaling (PT Score). This PT Score correlates with major transcriptomic and proteomic changes in response to genetic variability. Those large-scale molecular adaptations correlate with and explain phenotypic consequences for multiple cellular traits. Variants affecting the stoichiometry of proteins within a specific module have regional effects that remain confined to smaller parts of the molecular network. Variants affecting only one or very few proteins change molecular networks only locally. The global reorganization of network states caused by variants of the first type result in consequences for many cellular traits (i.e. pleiotropic effects), such as growth on different carbon sources, stress response, energy metabolism and replicative lifespan. (Created with BioRender.com) C_FIG
Brunnsaker, D.; Gower, A. H.; Naval, P.; Bjurström, E. Y.; Kronström, F.; Tiukova, I. A.; King, R. D.
Show abstract
Automation is transforming scientific discovery by enabling systematic exploration of complex hypotheses. Large language models (LLMs) perform well across diverse tasks and promise to accelerate research, but often struggle to interact with logical structures. Here we present a framework integrating LLM-based agents with laboratory automation, guided by a logical scaffold incorporating symbolic relational learning, structured vocabularies, and experimental constraints. This integration reduces output incoherence and improves reliability in automated workflows. We couple this AI-driven approach to automated cell-culture and metabolomics platforms, enabling hypothesis validation and refinement, yielding a flexible system for scientific discovery. We validate the system in Saccharomyces cerevisiae, identifying novel interactions, including glutamate-induced synergistic growth inhibition in spermine-treated cells and aminoadipates partial rescue of formic-acid stress. All hypotheses, experiments, and data are captured in a graph database employing controlled vocabularies. Existing ontologies are extended, and a novel representation of scientific hypotheses is presented using description logics. This work enables a more reliable, machine-driven discovery process in systems biology.
Tangirala, S.; Isaac, S.; Gehad, Y.; Roquefort, F.; Gabrieli, P.; Miller, G.; Thaker, V. V.; Tierney, B. T.; Patel, C. J.
Show abstract
Mapping patient progression from a healthy metabolic state to Type 2 Diabetes (T2D) provides opportunities for precision medicine-driven preventative interventions. We constructed the Metabolic Atlas of Progression to Diabetes (MAP-D), leveraging proteomic data on 2,923 proteins measured in a median of 47,963 UK Biobank participants to compute associations with hallmarks of metabolic disease (body mass index, HDL, LDL, triglyceride to HDL ratio [TRIG/HDL]), systolic and diastolic blood pressure), and glycated hemoglobin A1C [HbA1c]) in individuals with normoglycemia, prediabetes, and type 2 diabetes. The MAP-D contains proteomic signatures that discriminate between patient subpopulations (e.g., individuals with obesity and normoglycemia, individuals with obesity and T2D) along known (e.g., leptin [LEP], growth hormone receptor [GHR]) and potentially unexplored (e.g., B-cell differentiation antigen [CD72], ADAMTS-like protein 2 [ADAMTSL2]) axes of disease. MAP-D proteins improved prediction of BMI, HDL, LDL, TRIG-HDL ratio and HbA1c in T2D compared to demographics alone (full model R2 of up to 0.8; {Delta}R{superscript 2} up to 0.7). Further, we integrated the MAP-D with proteomic data from semaglutide (GLP-1 receptor agonist [GLP1RA]) intervention trials and found signatures of therapeutic efficacy and reversion to a healthy metabolic state. A subset of proteins were "therapeutically intransient", or were associated with metabolic disease but not affected by semaglutide. This suggests that divergent pathogenic pathways that contain proteins (e.g., EGFR) are associated with future cardiovascular, kidney, or liver complications. More importantly, these proteins are known therapeutic targets of approved drugs (e.g., nitroglycerin) indicating that combined GLP1RA therapies may yield better disease outcomes. In total, we propose the MAP-D as a resource for characterizing circulating metabolic disease pathways and improving disease management. The atlas is available as a resource at https://btierneyshiny.shinyapps.io/mapd-visualizer/ 1,2.
Grzes, M.; Jaiswar, A.; Grochowski, M.; Wojtys, W.; Kazmierczak, W.; Olesinski, T.; Lenarcik, M.; Nowak-Niezgoda, M.; Kolos, M.; Canarutto, G.; Piazza, S.; Wisniewski, J. R.; Walerych, D.
Show abstract
Major driver oncogenes CMYC, mutant KRAS and mutant TP53 often co-exist and cooperate in promoting human neoplasia. By CRISPR-Cas9-mediated downregulation we determined their proteomics and transcriptomics downstream programs in a panel of cell lines with activated either single or three oncogenes - in cancers of lung, colon and pancreas. This allowed to define and screen the oncogenes common functional program for anti-cancer target candidates, and find protocols which efficiently kill cancer cells and organoids by targeting pathways represented by a signature of three genes: RUVBL1, HSPA9 and XPO1. We found that these genes were controlled by the driver oncoproteins in a redundant or competitive manner, rather than by cooperation. Each oncoprotein individually was able to upregulate the three target genes, while upon oncogene co-expression each target was controlled preferably by a specific oncoprotein which reduced the influence of the others. Mechanistically this redundancy was mediated by parallel routes of the target gene activation - as in the case of mutant KRAS signaling to C-JUN and GLI-2 transcription factors bypassing CMYC, and by competition - as in the case of mutant p53 and CMYC competing for biding to the target promoters. The transcriptomics data from the cell lines and patient samples indicate that the redundancy of the oncogenic programs is a broad phenomenon which may comprise even a majority of the genes dependent on the oncoprotein, as shown for mutant p53 in colon and lung cancer cell lines. Nevertheless, we demonstrate that the redundant oncogene programs harbor targets of efficient anti-cancer drug combinations, bypassing limitations of a direct oncoprotein inhibition.
Pastva, S.; Safranek, D.; Benes, N.; Brim, L.; Henzinger, T.
Show abstract
Recent developments in both computational analysis and data-driven synthesis enable a new era of automated reasoning with logical models (Boolean networks in particular) in systems biology. However, these advancements also motivate an increased focus on quality control and performance comparisons between tools. At the moment, to illustrate real-world applicability, authors typically test their approaches on small sets of manually curated models that are inherently limited in scope. This further complicates reuse and comparisons, because benchmark models often contain ad hoc modifications or are outright not available. In this paper, we describe a new, comprehensive, open source dataset of 210+ Boolean network models compiled from available databases and a literature survey. The models are available in a wide range of formats. Furthermore, the dataset is accompanied by a validation pipeline that ensures the integrity and logical consistency of each model. Using this pipeline, we identified and repaired 400+ potential problems in a number of widely used models.
Sharma, R.; Meimetis, N.; Begzati, A.; Nagar, S. D.; Kellman, B.; Baghdassarian, H. M.
Show abstract
A central goal of conducting omics measurements is to understand how molecular features inform higher-order cell- and tissue-level phenotypes. In particular, multi-omics offers insights into how information encoded by the genome is coordinated through biological layers, resulting in functional outputs1. Due to myriad post-transcriptional regulatory processes, the coordination between mRNA and protein cannot be simply reduced to gene-wise correlation. Yet, both modalities have been shown to serve as representations of biological state, and multi-omics integration has been used to improve these representations. Multi-omics approaches typically do not focus on how mRNA and protein features coordinate, but rather use the additional information for improved prediction or feature selection. Here, instead, we showed that standard linear machine learning models provide an understanding of transcriptomic and proteomic coordination in the context of a biological phenotype of interest, in this case cancer metastasis. We find that, in the context of metastasis, a select subset of proteomic features--reflecting a more concentrated signal relative to the broadly distributed transcriptomic signal--offers additional information to that encoded by transcriptomics, as demonstrated by improved model performance when integrating the two modalities and the relative feature importance of proteomics. Top features show a depletion of gene-product overlap across modalities, indicating that the model primarily leverages instances in which the two modalities are providing complementary information with respect to phenotype. However, in instances when both modalities are selected for a given gene product, there is high information consistency that synergistically bolsters phenotype prediction. Altogether, by using model fits that relate both modalities to phenotype, we observe a nuanced coordination of protein and mRNA, in which both modalities tend to provide consistent information about phenotype, yet benefits remain to incorporating a combination of both complementary and reinforcing signals across modalities.
Le Treut, G.; Si, F.; Li, D.; Jun, S.
Show abstract
The reference point for cell-size control in the cell cycle is a fundamental biological question. We previously reported that we were unable to reproduce the conclusions of Witz et al.s eLife paper (Witz, van Nimwegen, and Julou 2019) entitled, "Initiation of chromosome replication controls both division and replication cycles in E. coli through a double-adder mechanism", despite extensive efforts. In this replication double adder (RDA) model, both replication and division cycles are determined via replication initiation as the sole implementation point of size control. Witz et al. justified the RDA model using a type of correlation analysis (the "I-value analysis") that they developed. By contrast, we previously showed that, in both Escherichia coli and Bacillus subtilis, replication initiation and cell division are determined by balanced biosynthesis of key cell cycle proteins (e.g., DnaA for initiation and FtsZ for cell division) and their accumulation to their respective threshold numbers, which Witz et al. coined the independent double adder (IDA) model. The adder phenotype is a natural quantitative consequence of these mechanistic principles. In a recent bioRxiv response to our report, Witz and colleagues explicitly confirmed two important limitations of the I-value analysis: (1) it is only applicable to non-overlapping cell cycles, wherein E. coli is known to deviate from the adder principle, and (2) it is only applicable to select biological models and, for example, cannot evaluate the IDA model. These limitations of the I-value analysis were not explained in the original eLife paper and were overlooked during the review process. In this report, we show using data analysis, mathematical modeling, and experiments why the I-value analysis - in its current implementation - cannot compare different biological models. Furthermore, the RDA model is incompatible with the adder principle and is not broadly supported by experimental data. For completeness, we also provide a detailed point-by-point response to Witz et al.s response (Witz, Julou, and van Nimwegen 2020) in the Supplemental Information.
Basile, A.; Heinken, A.; Hertel, J.; Smarr, L.; Li, W.; Treu, L.; Valle, G.; Campanero, S.; Thiele, I.
Show abstract
Inflammatory bowel diseases (IBD) are characterised by episodic inflammation of the gastrointestinal tract. Gut microbial dysbiosis characterises the pathoetiology, but its role remains understudied. We report the first use of constraint-based microbial community modelling on a single individual with IBD, covering seven dates over 16 months, enabling us to identify a number of time-correlated microbial species and metabolites. We find that the individuals dynamical microbial ecology in the disease state drives time-varying in silico overproduction, compared to healthy controls, of more than 24 biologically important metabolites, including oxygen, methane, thiamine, formaldehyde, trimethylamine N-oxide, folic acid, serotonin, histamine, and tryptamine. A number of these metabolites may yield new biomarkers of disease progression. The microbe-metabolite contribution analysis revealed that some genus Dialister species changed metabolic pathways according to the disease phases. At the first time point, characterised by the highest levels of blood and faecal inflammation biomarkers, they produced L-serine or formate. The production of the compounds, through a cascade effect, was mediated by the interaction with pathogenic Escherichia coli strains and Desulfovibrio piger. We integrated the microbial community metabolic models of each time point with a male whole-body, organ-resolved model of human metabolism to track the metabolic consequences of dysbiosis at different body sites. The presence of D. piger in the gut microbiome influenced the sulphur metabolism with a domino effect affecting the liver. These results underline the importance of tracking an individuals gut microbiome along a time course, creating a new analysis framework for self-quantified medicine. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=115 SRC="FIGDIR/small/520975v1_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@136ef4forg.highwire.dtl.DTLVardef@1993042org.highwire.dtl.DTLVardef@2f795forg.highwire.dtl.DTLVardef@c95a6d_HPS_FORMAT_FIGEXP M_FIG C_FIG
Yue, L.; Jiang, W.; Li, S.; Luo, M.; Fan, N.; Zhan, X.; Chen, K.; Lu, T.; Guo, F.; Li, D.; Ge, W.; Nie, Z.; Lyu, M.; A, J.; Wang, Y.; Chen, Y.; Fu, Z.; Xiang, N.; Li, L.; Yu, F.; Teo, G. C.; Nesvizhskii, A.; Wang, M.; Snyder, M.; Collins, B. C.; Aebersold, R.; Xu, F.; Liu, T.; Li, Y.; Guo, T.
Show abstract
A comprehensive spatial distribution of the proteome in human body and cancers is fundamental for understanding human biology and diseases including cancers. Here, we present an anatomically resolved human proteome derived from 1781 benign and malignant samples from 58 major tissue types encompassing 251 specific tissues and 25 carcinomas. Based on a spectral library covering over 75% of the human protein-coding genes with 208 understudied and 82 missing proteins characterized, we quantified over 13,000 proteins in these samples using data-independent acquisition proteomics. This data resource presents the so far most comprehensive quantitative proteomic landscape of human tissues and common carcinomas. It allows systematic evaluation of tissue-specific drug responses, identification of drug candidates that may be repurposed as antineoplastics, and discovery of novel targets for anticancer therapy. This resource, available as an online knowledgebase, refines our knowledge of spatial distribution of the human proteome and tumor-specific protein modulation.
Laman Trip, D. S.; van Oostrum, M.; Memon, D.; Frommelt, F.; Baptista, D.; Panneerselvam, K.; Bradley, G.; Licata, L.; Hermjakob, H.; Orchard, S.; Trynka, G.; McDonagh, E.; Fossati, A.; Aebersold, R.; Gstaiger, M.; Wollscheid, B.; Beltrao, P.
Show abstract
Proteins that interact together participate in the same cellular process and influence the same organismal traits. Despite the progress in mapping protein-protein interactions we lack knowledge of how they differ between tissues. Due to coordinated (post)transcriptional control, protein complex members have highly correlated abundances that are predictive of functional association. Here, we have compiled 7873 proteomic samples measuring protein levels in 11 human tissues and use these to define an atlas with tissue-specific protein associations. This method recapitulates known protein complexes and the larger structural organization of the cell. Interactions of stable protein complexes are well preserved across tissues, while signaling and metabolic interactions show larger variation. Further, we find that less than 18% of differences between tissues are estimated to be due to differences in gene expression while cell-type specific cellular structures, such as synaptic components, represent a significant driver of differences between tissues. We further supported the brain protein association network through co-fractionation experiments in synaptosomes, curation of brain derived pull-down data and AlphaFold2 models. Together these results illustrate how this brain specific protein interaction network can functionally prioritize candidate genes within loci linked to brain disorders.
Bertaux, F.; Kleijn, I. T.; Marguerat, S.; Shahrezaei, V.
Show abstract
Steady-state cell size and geometry depend on growth conditions. Here, we use an experimental setup based on continuous culture and single-cell imaging to study how cell volume, length, width and surface-to-volume ratio vary across a range of growth conditions including nitrogen and carbon titration, the choice of nitrogen source, and translation inhibition. Overall, we find cell geometry is not fully determined by growth rate and depends on the specific mode of growth rate modulation. However, under nitrogen and carbon titrations, we observe that the cell volume and the growth rate follow the same linear scaling. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=199 SRC="FIGDIR/small/536544v1_fig1.gif" ALT="Figure 1"> View larger version (49K): org.highwire.dtl.DTLVardef@1d5c614org.highwire.dtl.DTLVardef@1bf877corg.highwire.dtl.DTLVardef@157f140org.highwire.dtl.DTLVardef@9fc243_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigureC_FLOATNO A. Graphical outline of the growth and imaging assays. B. Illustration of the procedure used to extract cell size and geometry data. C. Average surface-to-volume (S/V) ratio plotted as a function of average cell width across all steady-state cultures. The dark grey circle indicates the base growth medium, EMM2. Cultures limited in their growth by the concentration of ammonium, glucose, and cycloheximide in the medium are indicated with cyan lozenges, green triangles, and orange squares, respectively. Cultures limited by the choice of nitrogen source are indicated with light grey circles; the amino-acid nitrogen source used is labelled with its three-letter abbreviation, using Amm for the equivalent culture grown with ammonium chloride as its sole nitrogen source. D. Average cell volume plotted against the growth rate across all cultures, showing collapse for ammonium- and glucose-limited cultures and differing behaviour for nitrogen-source- and translation-limited cultures. Plotted in dark grey is a linear fit to the ammonium- and glucose-limited data, including the base medium, with 95% confidence interval; a similar fit (without CI) to the cycloheximide-limited cultures is indicated by a dashed orange line. E. Average surface-to-volume (S/V) ratio against growth rate across all cultures, with dashed lines for ammonium-, glucose-, and translation-limited cultures representing linear fits to the respective data, each including the base medium. Under ammonium limitation, the surface-to-volume ratio increases markedly as the growth rate decreases. A moderate increase is observed for glucose limitation and a moderate decrease is observed for protein synthesis inhibition with cycloheximide. There is no consistent trend with the growth rate when the quality of the nitrogen source is varied. F. Average surface area, G. cell length, H. cell width against growth rate, showing that ammonium-limited cells are thinner than glucose-limited cells at equivalent growth rates, consistent with their different surface-to-volume ratios and the relation between surface-to-volume ratio and cell width (see C). C_FIG
Davtian, D.; Dupuis, T.; Mansour Aly, D.; Atabaki Pasdar, N.; Walker, M.; Franks, P. W.; Rutters, F.; Im, H. K. W.; Pearson, E. R.; van de Bunt, M.; Vinuela, A.; Brown, A.
Show abstract
Identification of genes and proteins mediating the activity of GWAS variants requires molecular data from disease relevant tissues, but these may be difficult to collect. Using multiple gene expression reference datasets and GWAS summary statistics for T2D we identified 1,818 unique genes associated with T2D. Comparing the performance of different reference datasets, we found that sample size, and not the relevance of the tissue to the disease, was the critical factor in identifying relevant genes. Genes implicated using a well powered expression dataset were also more likely to have multiple lines of genetic evidence. A targeted proteomics reference dataset from plasma samples showed similar power to identify T2D related proteins as gene expression with the same sample size. Accounting for BMI reduces power across all tissues and phenotypes by [~]30%, suggesting that many GWAS links to T2D are mediated by BMI, potentially implicating insulin resistance related effects. Finally, using data from smaller GWAS studies with precisely defined T2D subtypes uncovers genes directly relevant to that subtype, such as LST1, an immune response gene for Severe Autoimmune Diabetes and TRMT2A, involved in beta-cell apoptosis, for Severe Insulin Deficient Diabetes. Our work demonstrates the benefits of well powered reference datasets in accessible tissues and well-defined disease subtypes when studying complex diseases involving multiple tissues.
Halama, A.; Zaghlool, S.; Thareja, G.; Kader, S.; Al Muftha, W.; Mook-Kanamori, M.; Sarwath, H.; Ali Mohamoud, Y.; Ameling, S.; Pucic Bakovic, M.; Krumsiek, J.; Prehn, C.; Adamski, J.; Friedrich, N.; Voelker, U.; Wuhrer, M.; Lauc, G.; Najafi, H.; Malek, J. A.; Graumann, J.; Mook-Kanamori, D.; Schmidt, F.; Suhre, K.
Show abstract
In-depth multiomics phenotyping can provide a molecular understanding of complex physiological processes and their pathologies. Here, we report on the application of 18 diverse deep molecular phenotyping (omics-) technologies to urine, blood, and saliva samples from 391 participants of the multiethnic diabetes study QMDiab. We integrated quantitative readouts of 6,304 molecular traits with 1,221,345 genetic variants, methylation at 470,837 DNA CpG sites, and gene expression of 57,000 transcripts using between-platform mutual best correlations, within-platform partial correlations, and genome-, epigenome-, transcriptome-, and phenome-wide associations. The achieved molecular network covers over 34,000 statistically significant trait-trait links and illustrates "The Molecular Human". We describe the variances explained by each omics layer in the phenotypes age, sex, BMI, and diabetes state, platform complementarity, and the inherent correlation structures of multiomics. Finally, we discuss biological aspects of the networks relevant to the molecular basis of complex disorders. We developed a web-based interface to "The Molecular Human", which is freely accessible at http://comics.metabolomix.com and allows dynamic interaction with the data.
Gobet, C.; Weger, B.; Marquis, J.; Martin, E.; Gachon, F.; Naef, F.
Show abstract
Protein translation depends on mRNA-specific initiation, elongation and termination rates. While the regulation of ribosome elongation is well studied in bacteria and yeast, less is known in higher eukaryotes. Here, we combined ribosome and tRNA profiling to investigate the relations between ribosome elongation rates, (aminoacyl-) tRNA levels and codon usage in mammals. We modeled codon-specific ribosome dwell times and translation fluxes from ribosome profiling, considering pair-interactions between ribosome sites. In mouse liver, the model revealed site and codon specific dwell times, as well as codon pair-interactions clustering by amino acids. While translation fluxes varied significantly across diurnal time and feeding regimen, codon dwell times were highly stable, and conserved in human. Fasting had no effect on codon dwell times in mouse liver. Profiling of total and aminoacyl-tRNAs revealed highly heterogeneous levels that correlated with codon usage and showed specific isoacceptor patterns. tRNAs for several amino acids were lowly loaded, which was conserved in fasted mice. Finally, codons with low levels of charged tRNAs and high codon usage relative to tRNA abundance exhibited long dwell times. Together, these analyses pave the way towards understanding the complex interactions between tRNA loading, codon usage and ribosome dwell times in mammals.
Gligorovski, V.; Sadeghi, A.; Rahi, S. J.
Show abstract
For quantitative systems biology, simultaneous readout of multiple cellular processes as well as precise, independent control over different genes activities are essential. In contrast to readout systems such as fluorescent proteins, control systems such as inducible transcription-factor-promoter systems have only been characterized in an ad hoc fashion, impeding precise system-level manipulations of biological systems and reliable modeling. We designed and performed systematic benchmarks involving easy-to-communicate units to characterize and compare inducible transcriptional systems. We built a comprehensive single-copy library of inducible systems controlling standardized fluorescent protein expression in budding yeast, including GAL1pr, GALL, MET3pr, CUP1pr, PHO5pr, tetOpr, terminator-tetOpr, Z3EV system, the blue-light optogenetic systems El222-LIP, El222-GLIP and the red-light inducible PhyB-PIF3 system. To analyze these systems dynamic properties, we performed high-throughput time-lapse microscopy. The analysis of >100 000 cell images was made possible by the recently developed convolutional neural network YeaZ. We report key kinetic parameters, scaling of noise levels, impacts on growth, and, crucially, the fundamental leakiness of each system. Our multidimensional benchmarking additionally uncovers unexpected disadvantages of widely used tools, e.g., nonmonotonic activity of the MET3 and GALL promoters, slow off kinetics of the doxycycline and estradiol-inducible systems tetOpr and Z3EV, and high variability of PHO5pr and red-light activated PhyB-PIF3 system. We introduce two new tools for controlling gene expression: strongLOV, a more light-sensitive El222 mutant, and ARG3pr that functions as an OR gate induced by the lack of arginine or presence of methionine. To demonstrate the ability to finely control genetic circuits, we experimentally tuned the time between cell cycle Start and mitotic entry in budding yeast, artificially simulating near-wild-type timing. The characterizations presented here define the compromises that need to be made for quantitative experiments in systems and synthetic biology. To calibrate perturbations across laboratories and to allow new inducible systems to be benchmarked, we deposited single-copy reporter yeast strains, plasmids, and computer analysis code in public repositories. Furthermore, this resource can be accessed and expanded through the website https://promoter-benchmark.epfl.ch/.
Hall, M.; Küeltz, D.; Almaas, E.
Show abstract
Using abundance measurements of 1,490 proteins from four separate populations of three-spined sticklebacks, we implemented a system-level approach to correlate proteome dynamics with environmental salinity and temperature and the fishs population and morphotype. We identified sets of robust and accurate fingerprints that predict environmental salinity, temperature, morphotype and the population sample origin, observing that proteins with specific functions are enriched in these fingerprints. Highly apparent functions represented in all fingerprints include ion transport, proteostasis, growth, and immunity, suggesting that these functions are most diversified in populations inhabiting different environments. Applying a differential network approach, we analyzed the network of protein interactions that differs between populations. Looking at specific population combinations of differential interaction, we identify sets of connected proteins. We find that these sets and their corresponding enriched functions reflect key processes that have diverged between the four populations. Moreover, the extent of divergence, i.e. the number of enriched functions that differ between populations, is highest when all three environmental parameters are different between two populations. Key nodes in the differential interaction network signify functions that are also inherent in the fingerprints, most prominently proteostasis-related functions. However, the differential interaction network also reveals additional functions that have diverged between populations, notably cytoskeletal organization and morphogenesis. Having such a large proteomic dataset, the strength of these analyses is that the results are purely data-driven, not based on previous findings and hypotheses about adaptation. With such an unbiased approach applied on a large proteomic dataset, we find the strongest signals given by the data, making it possible to develop more discriminatory and complex biomarkers for specific contexts of interest.
Parisien-La Salle, S.; Heydarpour, M.; Tsai, C.-H.; Brown, J. M.; Newman, A. J.; Mahrokhian, S.; Hanna, I.; Honzel, B.; Tsai, L. C.; Waikar, S. S.; Inoue, K.; Zennaro, M.-C.; Auchus, R. J.; Turcu, A. F.; Williams, J. S.; Sacks, B.; Moussa, M.; Vaidya, A.
Show abstract
Unstructured AbstractPrimary aldosteronism (PA) is renin-independent aldosterone production that causes hypertension and cardiovascular disease. We investigated the proteomic evolution of PA from normotensive people with renin-independent aldosteronism to those with overt PA. The PA plasma proteome was characterized by pathways related to cardiovascular disease (inflammation, energy/redox, vascular remodeling). We identified proteins exhibiting dose-dependent trends paralleling the continuum of renin-independent aldosterone production, then using adrenal vein proteomics, identified proteins exhibiting the archetypal pattern of unilateral PA (peak abundance in the dominant vein with suppression in the contralateral vein). Among these, Norrin, a Wnt/{beta}-catenin ligand previously identified as a risk locus for PA by GWAS, was robustly validated using functional testing (ACTH- and angiotensin II-induced interventions, and correlations with 18-hybrid steroids) and genetic testing (dose-dependent associations with NDP SNPs). The evolution of PA originates in normotensive people, is characterized by proteomic signatures of cardiovascular disease, and Norrin is a novel regulator of PA pathophysiology.